Multimodal Pivots for Image Caption Translation
نویسندگان
چکیده
We present an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space. Image similarity is computed by a convolutional neural network and incorporated into a target-side translation memory retrieval model where descriptions of most similar images are used to rerank translation outputs. Our approach does not depend on the availability of in-domain parallel data and achieves improvements of 1.4 BLEU over strong baselines.
منابع مشابه
CUNI System for the WMT17 Multimodal Traslation Task
In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with b...
متن کاملCUNI System for the WMT17 Multimodal Translation Task
In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with b...
متن کاملImage2Text: A Multimodal Caption Generator
In this work, we showcase the Image2Text system, which is a real-time captioning system that can generate human-level natural language description for any input image. We formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from...
متن کاملGenerating Image Descriptions using Multilingual Data
In this paper we explore several neural network architectures for the WMT 2017 multimodal translation sub-task on multilingual image caption generation. The goal of the task is to generate image captions in German, using a training corpus of images with captions in both English and German. We explore several models which attempt to generate captions for both languages, ignoring the English outp...
متن کاملMultimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
In state-of-the-art Neural Machine Translation, an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, whe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1601.03916 شماره
صفحات -
تاریخ انتشار 2016